Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

نویسندگان

  • Nathan Schneider
  • Spencer Onuffer
  • Nora Kazour
  • Emily Danchik
  • Michael T. Mordowanec
  • Henrietta Conrad
  • Noah A. Smith
چکیده

Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for multiword expressions, including those containing gaps.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Can Recognising Multiword Expressions Improve Shallow Parsing?

There is significant evidence in the literature that integrating knowledge about multiword expressions can improve shallow parsing accuracy. We present an experimental study to quantify this improvement, focusing on compound nominals, proper names and adjectivenoun constructions. The evaluation set of multiword expressions is derived from WordNet and the textual data are downloaded from the web...

متن کامل

A French Corpus Annotated for Multiword Expressions with Adverbial Function

This paper presents a French corpus annotated for multiword expressions (MWEs) with adverbial function. This corpus is designed for investigation on information retrieval and extraction, as well as on deep and shallow syntactic parsing. We delimit which kind of MWEs we annotated, we describe the resources and methods we used for the annotation, and we briefly comment the results. The annotated ...

متن کامل

PARSEME-It Corpus An annotated Corpus of Verbal Multiword Expressions in Italian

English. This paper describes a new language resource annotated with verbal multiword expressions (VMWEs) in Italian. The paper discusses the state of the art in VMWE identification and annotation in Italian, the methodology adopted, the various VMWE categories annotated, the corpus and the annotation process. Finally, the paper ends with results, conclusion and future work. Italiano. Questo co...

متن کامل

A Framework for the Classification and Annotation of Multiword Expressions in Dialectal Arabic

In this paper we describe a framework for classifying and annotating Egyptian Arabic Multiword Expressions (EMWE) in a specialized computational lexical resource. The framework intends to encompass comprehensive linguistic information for each MWE including: a. phonological and orthographic information; b. POS tags; c. structural information for the phrase structure of the expression; d. lexico...

متن کامل

Lexical Semantic Analysis in Natural Language Text

Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural language ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014